A Multiclass Classification of Cancer Data: Using a Kernel Based Clustering k-NN Support Vector Machine
نویسندگان
چکیده
Support vector machines (SVM) have been promising methods for classification because of their solid mathematical foundations which convey several salient properties that other methods hardly provide. However, despite of the prominent properties of SVM, they are not as favored for large-scale data as complexity of SVM is highly dependent on the size of a data set. Microarray gene expression data that usually have a large number of dimensions, over thousands of genes, and a small number of samples, e.g., a few tens of patients. This paper presents a noble and efficient approach, Dimensionality Reduction (T-test), followed by Clustering kNN Support Vector machines (CK-SVM), which is specially designed for handling very large data sets like Microarray gene expression data. The CK-SVM classifies by reflecting the degree of a training data point, as a support vector by using Gaussian function, with K-nearest neighbor (k-NN) and Euclidean distance measure. To add local control property a simple clustering scheme is implemented, before Gaussian functions are constructed for each cluster. In addition probabilistic SVM out puts are used for extending binary classification to multiclass classification, in a pair wise manner. In this paper a multiclass classification has been applied to cancer data, represented by SRBCT data set.
منابع مشابه
A comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater
The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...
متن کاملMULTI CLASS BRAIN TUMOR CLASSIFICATION OF MRI IMAGES USING HYBRID STRUCTURE DESCRIPTOR AND FUZZY LOGIC BASED RBF KERNEL SVM
Medical Image segmentation is to partition the image into a set of regions that are visually obvious and consistent with respect to some properties such as gray level, texture or color. Brain tumor classification is an imperative and difficult task in cancer radiotherapy. The objective of this research is to examine the use of pattern classification methods for distinguishing different types of...
متن کاملIdentifying Efficient Kernel Function in Multiclass Support Vector Machines
Support vector machine (SVM) is a kernel based novel pattern classification method that is significant in many areas like data mining and machine learning. A unique strength is the use of kernel function to map the data into a higher dimensional feature space. In training SVM, kernels and its parameters have very vital role for classification accuracy. Therefore, a suitable kernel design and it...
متن کاملSupport Vector Machine for Multiclass Handwritten Digits
In our research paper, we have implemented Multiclass Classification using Support Vector Machine (SVM). Pen Digit Recognition of Handwritten digit dataset is used for the purpose. One vs All approach has been applied using SVM to achieve multiclass classification. The same approach with different kernels has been analysed to select the right kernel. In this paper, we have found that selection ...
متن کاملJournal of Machine Learning Research X (2008) 1-34 Submitted 01/08; Revised 08/08; Published XX/XX
Multiple kernel learning (MKL) aims at simultaneously learning a kernel and the associated predictor in supervised learning settings. For the support vector machine, an efficient and general multiple kernel learning algorithm, based on semi-infinite linear progamming, has been recently proposed. This approach has opened new perspectives since it makes MKL tractable for large-scale problems, by ...
متن کامل